Packages and Data Loading

As mentioned within the session setup, load the following packages using the library() function. Additionally, as we will be using a data set with large numbers, set scipen to 999 using the option function.

  library(tidyverse)
  library(RColorBrewer)

  options(scipen = 999)

As mentioned previously, for the purpose of this session, we will be using a sample of the Online Gaming Anxiety Data, this subset is avaliable to download as part of the .zip file downloadable from the Github Repo. It can then be loaded using the following script:

gamingdata_samp5000 <- read_csv("../../data/gamingdata_samp5000.csv")
gamingdata_samp100 <- read_csv("../../data/gamingdata_samp100.csv")

Throughout this practical you are welcome to use either the large sampled version (n=5000), or small sampled version (n=100). Throughout this practical the large sampled version will be used.


Section 5: Layering different geom’s

Exercise 11.

Using the code developed in Section 2 (Age, Hours and Total Spin Score) add an additional geom_jitter() layer which plots the GAD score.

  ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       colour = Gender,
                       size = Hours)) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       colour = Gender,
                       size = Hours)) + 
  labs(x = "Age", y = "SPIN Total Score", 
       title = "Age in Relation to SPIN Total Score")
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Exercise 12.

Expanding on the code from Exercise 11, replace the colour variable (currently Gender) with a colour to differentiate between SPIN and GAD. You can add back in Gender as a shape variable (see here for options).

  ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       colour = "red",
                       size = Hours, 
                       shape = Gender)) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       colour = "blue",
                       size = Hours, 
                       shape = Gender)) + 
  labs(x = "Age", y = "SPIN Total Score", 
       title = "Age in Relation to SPIN Total Score")
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Exercise 13.

Conducting a direct swap between a colour (“blue” or “red”) and the variable Gender, does not produce a usable legend. As such, remove the colour parameter from the mapping = aes() component.

  ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       size = Hours, 
                       shape = Gender),
                       colour = "red") + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       size = Hours, 
                       shape = Gender),
                       colour = "blue") + 
  labs(x = "Age", y = "SPIN Total Score", 
       title = "Age in Relation to SPIN Total Score")
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Exercise 14.

Although this corrects the colours, there is no usable legend. To add a colour specific legend, you must return the colour = parameter to inside mapping=aes(), alongside adding scale_colour_identity(). To apply this, use the given template to specify what the colours indicates. Additionally correct the axis names so they represent the contents of the plot.

 ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "red")) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "blue")) +
  labs(x = "Age", y = "Total Anxiety Measure Score", 
       title = "Age in Relation to SPIN Total Score") + 
  scale_colour_identity(name = "Anxiety Measure", 
                        breaks = c("red", "blue"), 
                        labels = c("SPIN", "GAD"),
                        guide = "legend")
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Exercise 15.

Colours and legends can also be used for continuous variables. Return to just a single variable (SPIN_T), and specify the colour parameter as the hours variable (Hours). Is this clearer than the when Hours was specified as the size variable?

 ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T, 
                       shape = Gender,
                       colour = Hours)) + 
  labs(x = "Age", y = "Total Anxiety Measure Score", 
       title = "Age in Relation to SPIN Total Score")
## Warning: Removed 241 rows containing missing values (geom_point).

Exercise 16.

To improve the clarity of the scale, you can use the function scale_colour_gradient() to specify the colours of the low and high points as well as any specific breaks. Add this function to the previous plot (Ex.14), using the template provided. Think about what are good breaks to use within this function.

 ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T, 
                       shape = Gender,
                       colour = Hours)) + 
  labs(x = "Age", y = "Total Anxiety Measure Score", 
       title = "Age in Relation to SPIN Total Score") + 
  scale_colour_gradient(low = "Yellow", high = "Red",
                        breaks = c(0, 50, 100, 150))
## Warning: Removed 241 rows containing missing values (geom_point).

Exercise 17.

An alternative to layering multiple plots on on top of each other, is to use facet_wrap() or facet_grid() to produce graphs side by side. Using this a faceting function you can split the graphs up by a specific variable. Try using facet_wrap() to split the graph produced in Ex.13, by Gender.

 ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "red")) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "blue")) +
  labs(x = "Age", y = "Total Anxiety Measure Score", 
       title = "Age in Relation to SPIN Total Score") + 
  scale_colour_identity(name = "Anxiety Measure", 
                        breaks = c("red", "blue"), 
                        labels = c("SPIN", "GAD"),
                        guide = "legend") + 
  facet_grid(.~ Gender)
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Exercise 18.

Alternatively you can split by a factor not included in the graph at the point of creation. Try using facet_grid() again to this time split the same graph by “Platform”.

 ggplot(data = gamingdata_samp5000) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = SPIN_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "red")) + 
  geom_jitter(alpha = 0.5,
         mapping = aes(x = Age,
                       y = GAD_T,
                       size = Hours, 
                       shape = Gender,
                       colour = "blue")) +
  labs(x = "Age", y = "Total Anxiety Measure Score", 
       title = "Age in Relation to SPIN Total Score") + 
  scale_colour_identity(name = "Anxiety Measure", 
                        breaks = c("red", "blue"), 
                        labels = c("SPIN", "GAD"),
                        guide = "legend") + 
  facet_grid(.~ Platform)
## Warning: Removed 252 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

Section 6: Publishing Plots from ggplot2

Exercise 19a.

Using the function ggsave(), export your last plot (using the parameter last_plot()). Use the template provided to structure your export.

ggsave(filename = "test_plot.png", 
       plot = last_plot())

Exercise 19b.

Alternatively, you can set a specific graph to a variable (for example: “plot_to_print”) and past it in place of last_plot(). Additionally, you can change parameters such as the height and width, dpi, scale and type of export accordingly. Try to export the plot you made for Ex.15, as a JPEG (.jpeg), with a dpi of 500 and a size of 5cmx10cm.

ggsave(filename = "test_plot2.jpeg",
       plot = plot_to_print,
       height = 5, 
       width = 10, 
       dpi = 500)

Exercise 20.

Alternatively, you can also use the export function in the Rstudio UI to export your plots. To do this, click export in your plots tab (if plots are produced there) and export accordingly. If you are using the accompanying worksheet (which is a .R script), try saving any of your previous plots as .pdf files following the instructions in RStudio.